AITopics | score 2

Collaborating Authors

score 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OPOR-Bench: Evaluating Large Language Models on Online Public Opinion Report Generation

Yu, Jinzheng, Xu, Yang, Li, Haozhen, Li, Junqi, Feng, Yifan, Zhu, Ligu, Shen, Hao, Shi, Lei

arXiv.org Artificial IntelligenceDec-2-2025

Online Public Opinion Reports consolidate news and social media for timely crisis management by governments and enterprises. While large language models have made automated report generation technically feasible, systematic research in this specific area remains notably absent, particularly lacking formal task definitions and corresponding benchmarks. To bridge this gap, we define the Automated Online Public Opinion Report Generation (OPOR-GEN) task and construct OPOR-BENCH, an event-centric dataset covering 463 crisis events with their corresponding news articles, social media posts, and a reference summary. To evaluate report quality, we propose OPOR-EVAL, a novel agent-based framework that simulates human expert evaluation by analyzing generated reports in context. Experiments with frontier models demonstrate that our framework achieves high correlation with human judgments. Our comprehensive task definition, benchmark dataset, and evaluation framework provide a solid foundation for future research in this critical domain.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.01896

Country:

Europe (0.93)
Asia > China (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area (0.93)
Information Technology (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TRUEBench: Can LLM Response Meet Real-world Constraints as Productivity Assistant?

Park, Jiho, Song, Jongyoon, Choi, Minjin, Heo, Kyuho, Huh, Taehun, Kim, Ji Won

arXiv.org Artificial IntelligenceSep-30-2025

Large language models (LLMs) are increasingly integral as productivity assistants, but existing benchmarks fall short in rigorously evaluating their real-world instruction-following capabilities. Current benchmarks often (i) lack sufficient multilinguality, (ii) fail to capture the implicit constraints inherent in user requests, and (iii) overlook the complexities of multi-turn dialogue. To address these critical gaps and provide a more realistic assessment, we introduce TRUEBench (Trustworthy Real-world Usage Evaluation Benchmark)1, a novel benchmark specifically designed for LLM-based productivity assistants. TRUEBench distinguishes itself by featuring input prompts across 12 languages, incorporating intra-instance multilingual instructions, employing rigorous evaluation criteria to capture both explicit and implicit constraints, and including complex multi-turn dialogue scenarios with both accumulating constraints and context switches. Furthermore, to ensure reliability in evaluation, we refined constraints using an LLM validator. Extensive experiments demonstrate that TRUEBench presents significantly greater challenges than existing benchmarks; for instance, a strong model like OpenAI o1 achieved only a 69.07% overall pass rate. TRUEBench offers a demanding and realistic assessment of LLMs in practical productivity settings, highlighting their capabilities and limitations.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2509.22715

Country:

Europe (1.00)
North America > United States (0.92)
Asia (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares

Brown, Gavin, Hayase, Jonathan, Hopkins, Samuel, Kong, Weihao, Liu, Xiyang, Oh, Sewoong, Perdomo, Juan C., Smith, Adam

arXiv.org Machine LearningApr-23-2024

We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of $X^\top X$, where $X$ is the design matrix. All prior private algorithms for this task require either $d^{3/2}$ examples, error growing polynomially with the condition number, or exponential time. Our near-optimal accuracy guarantee holds for any dataset with bounded statistical leverage and bounded residuals. Technically, we build on the approach of Brown et al. (2023) for private mean estimation, adding scaled noise to a carefully designed stable nonprivate estimator of the empirical regression vector.

algorithm, dataset, supp, (17 more...)

arXiv.org Machine Learning

2404.15409

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.45)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

PAGER: A Framework for Failure Analysis of Deep Regression Models

Thiagarajan, Jayaraman J., Narayanaswamy, Vivek, Trivedi, Puja, Anirudh, Rushil

arXiv.org Machine LearningSep-19-2023

Safe deployment of AI models requires proactive detection of potential prediction failures to prevent costly errors. While failure detection in classification problems has received significant attention, characterizing failure modes in regression tasks is more complicated and less explored. Existing approaches rely on epistemic uncertainties or feature inconsistency with the training distribution to characterize model risk. However, we show that uncertainties are necessary but insufficient to accurately characterize failure, owing to the various sources of error. In this paper, we propose PAGER (Principled Analysis of Generalization Errors in Regressors), a framework to systematically detect and characterize failures in deep regression models. Built upon the recently proposed idea of anchoring in deep models, PAGER unifies both epistemic uncertainties and novel, complementary non-conformity scores to organize samples into different risk regimes, thereby providing a comprehensive analysis of model errors. Additionally, we introduce novel metrics for evaluating failure detectors in regression tasks. We demonstrate the effectiveness of PAGER on synthetic and real-world benchmarks. Our results highlight the capability of PAGER to identify regions of accurate generalization and detect failure cases in out-of-distribution and out-of-support scenarios.

artificial intelligence, machine learning, pager, (18 more...)

arXiv.org Machine Learning

2309.10977

Country:

North America > United States > Michigan (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

Add feedback

Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation

Mahajan, Divyat, Mitliagkas, Ioannis, Neal, Brady, Syrgkanis, Vasilis

arXiv.org Artificial IntelligenceJun-12-2023

We study the problem of model selection in causal inference, specifically for the case of conditional average treatment effect (CATE) estimation under binary treatments. Unlike model selection in machine learning, there is no perfect analogue of cross-validation as we do not observe the counterfactual potential outcome for any data point. Towards this, there have been a variety of proxy metrics proposed in the literature, that depend on auxiliary nuisance models estimated from the observed data (propensity score model, outcome regression model). However, the effectiveness of these metrics has only been studied on synthetic datasets as we can access the counterfactual data for them. We conduct an extensive empirical analysis to judge the performance of these metrics introduced in the literature, and novel ones introduced in this work, where we utilize the latest advances in generative modeling to incorporate multiple realistic datasets. Our analysis suggests novel model selection strategies based on careful hyperparameter tuning of CATE estimators and causal ensembling.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.01939

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > India (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

Karlsson, Rickard, Willbo, Martin, Hussain, Zeshan, Krishnan, Rahul G., Sontag, David, Johansson, Fredrik D.

arXiv.org Machine LearningOct-28-2021

We study prediction of future outcomes with supervised models that use privileged information during learning. The privileged information comprises samples of time series observed between the baseline time of prediction and the future outcome; this information is only available at training time which differs from the traditional supervised learning. Our question is when using this privileged data leads to more sample-efficient learning of models that use only baseline data for predictions at test time. We give an algorithm for this setting and prove that when the time series are drawn from a non-stationary Gaussian-linear dynamical system of fixed horizon, learning with privileged information is more efficient than learning without it. On synthetic data, we test the limits of our algorithm and theory, both when our assumptions hold and when they are violated. On three diverse real-world datasets, we show that our approach is generally preferable to classical learning, particularly when data is scarce. Finally, we relate our estimator to a distillation approach both theoretically and empirically.

estimator, lupts, privileged information, (13 more...)

arXiv.org Machine Learning

2110.14993

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China > Liaoning Province > Shenyang (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
(8 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

Improving probability selecting based weights for Satisfiability Problem

Fu, Huimin, Xu, Yang, Liu, Jun, Wu, Guanfeng, Geoff, Sutcliffe

arXiv.org Artificial IntelligenceJul-29-2020

The Boolean Satisfiability problem (SAT) is important on artificial intelligence community and the impact of its solving on complex problems. Recently, great breakthroughs have been made respectively on stochastic local search (SLS) algorithms for uniform random k-SAT resulting in several state-of-the-art SLS algorithms Score2SAT, YalSAT, ProbSAT, CScoreSAT and on a hybrid algorithm for hard random SAT (HRS) resulting in one state-of-the-art hybrid algorithm SparrowToRiss. However, there is no an algorithm which can effectively solve both uniform random k-SAT and HRS. In this paper, we present a new SLS algorithm named SelectNTS for uniform random k-SAT and HRS. SelectNTS is an improved probability selecting based local search algorithm for SAT problem. The core of SelectNTS relies on new clause and variable selection heuristics. The new clause selection heuristic uses a new clause weighting scheme and a biased random walk. The new variable selection heuristic uses a probability selecting strategy with the variation of CC strategy based on a new variable weighting scheme. Extensive experimental results on the well-known random benchmarks instances from the SAT Competitions in 2017 and 2018, and on randomly generated problems, show that our algorithm outperforms state-of-the-art random SAT algorithms, and our SelectNTS can effectively solve both uniform random k-SAT and HRS.

algorithm, artificial intelligence, selectnts, (18 more...)

arXiv.org Artificial Intelligence

2007.15185

Country:

Asia > China > Sichuan Province > Chengdu (0.04)
North America > United States (0.04)
Europe > United Kingdom > Northern Ireland (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Refined bounds for algorithm configuration: The knife-edge of dual class approximability

Balcan, Maria-Florina, Sandholm, Tuomas, Vitercik, Ellen

arXiv.org Artificial IntelligenceJun-21-2020

Automating algorithm configuration is growing increasingly necessary as algorithms come with more and more tunable parameters. It is common to tune parameters using machine learning, optimizing performance metrics such as runtime and solution quality. The training set consists of problem instances from the specific domain at hand. We investigate a fundamental question about these techniques: how large should the training set be to ensure that a parameter's average empirical performance over the training set is close to its expected, future performance? We answer this question for algorithm configuration problems that exhibit a widely-applicable structure: the algorithm's performance as a function of its parameters can be approximated by a "simple" function. We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds. On the flip side, if the approximation holds only under the L-p norm for p smaller than infinity, it is not possible to provide meaningful sample complexity bounds in the worst case. We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science. Via experiments, we obtain sample complexity bounds that are up to 700 times smaller than the previously best-known bounds.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2006.11827

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning to Branch

Balcan, Maria-Florina, Dick, Travis, Sandholm, Tuomas, Vitercik, Ellen

arXiv.org Artificial IntelligenceMay-16-2018

Tree search algorithms, such as branch-and-bound, are the most widely used tools for solving combinatorial and nonconvex problems. For example, they are the foremost method for solving (mixed) integer programs and constraint satisfaction problems. Tree search algorithms recursively partition the search space to find an optimal solution. In order to keep the tree size small, it is crucial to carefully decide, when expanding a tree node, which question (typically variable) to branch on at that node in order to partition the remaining space. Numerous partitioning techniques (e.g., variable selection) have been proposed, but there is no theory describing which technique is optimal. We show how to use machine learning to determine an optimal weighting of any set of partitioning procedures for the instance distribution at hand using samples from the distribution. We provide the first sample complexity guarantees for tree search algorithm configuration. These guarantees bound the number of samples sufficient to ensure that the empirical performance of an algorithm over the samples nearly matches its expected performance on the unknown instance distribution. This thorough theoretical investigation naturally gives rise to our learning algorithm. Via experiments, we show that learning an optimal weighting of partitioning procedures can dramatically reduce tree size, and we prove that this reduction can even be exponential. Through theory and experiments, we show that learning to branch is both practical and hugely beneficial.

artificial intelligence, score 1, score 2, (17 more...)

arXiv.org Artificial Intelligence

1803.1015

Country: North America > United States (0.67)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer Tissues

Qaiser, Talha, Mukherjee, Abhik, Pb, Chaitanya Reddy, Munugoti, Sai Dileep, Tallam, Vamsi, Pitkäaho, Tomi, Lehtimäki, Taina, Naughton, Thomas, Berseth, Matt, Pedraza, Aníbal, Mukundan, Ramakrishnan, Smith, Matthew, Bhalerao, Abhir, Rodner, Erik, Simon, Marcel, Denzler, Joachim, Huang, Chao-Hui, Bueno, Gloria, Snead, David, Ellis, Ian, Ilyas, Mohammad, Rajpoot, Nasir

arXiv.org Artificial IntelligenceJul-24-2017

Evaluating expression of the Human epidermal growth factor receptor 2 (Her2) by visual examination of immunohistochemistry (IHC) on invasive breast cancer (BCa) is a key part of the diagnostic assessment of BCa due to its recognised importance as a predictive and prognostic marker in clinical practice. However, visual scoring of Her2 is subjective and consequently prone to inter-observer variability. Given the prognostic and therapeutic implications of Her2 scoring, a more objective method is required. In this paper, we report on a recent automated Her2 scoring contest, held in conjunction with the annual PathSoc meeting held in Nottingham in June 2016, aimed at systematically comparing and advancing the state-of-the-art Artificial Intelligence (AI) based automated methods for Her2 scoring. The contest dataset comprised of digitised whole slide images (WSI) of sections from 86 cases of invasive breast carcinoma stained with both Haematoxylin & Eosin (H&E) and IHC for Her2. The contesting algorithms automatically predicted scores of the IHC slides for an unseen subset of the dataset and the predicted scores were compared with the 'ground truth' (a consensus score from at least two experts). We also report on a simple Man vs Machine contest for the scoring of Her2 and show that the automated methods could beat the pathology experts on this contest dataset. This paper presents a benchmark for comparing the performance of automated algorithms for scoring of Her2. It also demonstrates the enormous potential of automated algorithms in assisting the pathologist with objective IHC scoring.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1111/his.13333

1705.08369

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
Asia > Singapore (0.14)
Europe > Ireland (0.04)
(7 more...)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.91)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback